PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations
نویسندگان
چکیده
Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.
منابع مشابه
Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach
The prior knowledge of protein structural class may offer useful clues on understanding its functionality as well as its tertiary structure. Though various significant efforts have been made to find a fast and effective computational approach to address this problem, it is still a challenging topic in the field of bioinformatics. The position-specific score matrix (PSSM) profile has been shown ...
متن کاملA Protein Structural Classes Prediction Method based on Various Information Fusion
Protein structural class’s knowledge plays an important role in understanding the folding mode of protein. The prediction of protein structural classes as a transitional stage of the secondary structure of the protein to the tertiary structure is considered to be an important and challenging task. In this paper, PSI-BLAST profile is used to extract the evolutionary information of protein, and t...
متن کاملConFunc - functional annotation in the twilight zone
MOTIVATION The success of genome sequencing has resulted in many protein sequences without functional annotation. We present ConFunc, an automated Gene Ontology (GO)-based protein function prediction approach, which uses conserved residues to generate sequence profiles to infer function. ConFunc split sets of sequences identified by PSI-BLAST into sub-alignments according to their GO annotation...
متن کاملA Discriminative Method for Protein Remote Homology Detection Based on N-nary Profiles
Protein homology detection is a key problem in computational biology. In this paper, a novel building block for protein called N-nary profile which contains the evolutionary information of protein sequence frequency profiles has been presented. The protein sequence frequency profiles calculated from the multiple sequence alignments outputted by PSI-BLAST are converted into N-nary profiles. Such...
متن کاملAlternative approach to protein structure prediction based on sequential similarity of physical properties.
The relationship between protein sequence and structure arises entirely from amino acid physical properties. An alternative method is therefore proposed to identify homologs in which residue equivalence is based exclusively on the pairwise physical property similarities of sequences. This approach, the property factor method (PFM), is entirely different from those in current use. A comparison i...
متن کامل